Exploring the Vector Space Model for Finding Verb Synonyms in Portuguese

نویسندگان

  • Luís Sarmento
  • Paula Carvalho
  • Eugénio C. Oliveira
چکیده

We explore the performance of the Vector Space Model (VSM) in finding verb synonyms in Portuguese by analyzing the impact of three operating parameters: (i) the weighting function, (ii) the context window used for automatically extracting features, and (iii) the minimum number of vector features. We rely on distributional statistics taken from a large n-gram database to build feature vectors, using minimal linguistic pre-processing. Automatic evaluation of synonym candidates using gold-standard information from the OpenOffice and Wiktionary thesaurus shows that low frequency features carry most information regarding verb similarity, and that a [0, +2] window carries more information than a [-2, 0] window. We show that satisfactory precision levels require vectors with 50 or more non-nil components. Manual evaluation over a set of declarative verbs and psychological verbs show that VSM-based approaches achieve good precision in finding verb synonyms for Portuguese, even when using minimal linguistic knowledge. This lead us to proposing a performance baseline for this task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reachability checking in complex and concurrent software systems using intelligent search methods

Software system verification is an efficient technique for ensuring the correctness of a software product, especially in safety-critical systems in which a small bug may have disastrous consequences. The goal of software verification is to ensure that the product fulfills the requirements. Studies show that the cost of finding and fixing errors in design time is less than finding and fixing the...

متن کامل

Finding High-Frequent Synonyms of A Domain-Specific Verb in English Sub-Language of MEDLINE Abstracts Using WordNet

The task of binary relation extraction in IE [3] is based mainly on high-frequent verbs and patterns. During the extraction of a specific relation from MEDLINE English abstracts, it is noticed that besides the high-frequent verb itself which represents the specific relation, some other word forms, such as the nominal and adjective forms of this verb, as well as its synonyms, also play a very im...

متن کامل

Space Vector Control Scheme of Three Level ZSI Applied to Wind Energy Systems

In this paper the Space Vector Control Scheme is implemented for a Wind Energy System using Three Level Impedance Source Inverter (ZSI). The wind energy system uses a Self Excited Induction generator (SEIG) which is the most emerging application in the field of Wind Energy Conversion System (WECS). The proposed system is modelled with a generator-side Diode Bridge Rectifier and a Stand-Alone si...

متن کامل

A Hybrid Meta-heuristic Approach to Cope with State Space Explosion in Model Checking Technique for Deadlock Freeness

Model checking is an automatic technique for software verification through which all reachable states are generated from an initial state to finding errors and desirable patterns. In the model checking approach, the behavior and structure of system should be modeled. Graph transformation system is a graphical formal modeling language to specify and model the system. However, modeling of large s...

متن کامل

Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary

In this article we present an exploratory approach to enrich a WordNet-like lexical ontology with the synonyms present in a standard monolingual Portuguese dictionary. The dictionary was converted from PDF into XML and senses were automatically identified and annotated. This allowed us to extract them, independently of definitions, and to create sets of synonyms (synsets). These synsets were th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009